I have 2 primary datasets, and 3 additional supporting datasets I will be using to invetigate this topic.
1: "CORA RTD Data with Stops" Obtained via RTD after submitting a Colorado Open Records Act (CORA) request. This dataset has performace data for the buslines SKIP, DASH, FF1, 225, and 225D.
2: "NOAA Weather Data" Obtained via NOAA Climate Data Online request that contains weather data
3: "Stops" Info regarding the locations of specific bus stops. Contains lat/long. Taken from http://www.rtd-denver.com/GoogleFeeder/
4: "stoptimes" Gives more detailed info about a specific bus and its journey by using trip_IDs. Info Taken from http://www.rtd-denver.com/GoogleFeeder/
5: "trips" Connects trip_IDs to specific routes. Taken from http://www.rtd-denver.com/GoogleFeeder/
I will be conducting my analysis on bus performance from October 1st 2019 - November 9th, 2019. The data from the rtd google feeder link is refreshed each day, and lacks key historical performance information. The CORA RTD Data dataset contains performance information for bus lines, but lacks valuable contextual information.
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sb
import requests, json
from datetime import datetime
import geopandas as gpd
import geoplot, contextily
RTD_performance_df = pd.read_csv('CORA RTD with stops.csv')
RTD_performance_df.head()
#This data contains the scheduled and actual departure times for buses.
#By calculating the difference between scheduled and actual departure times, I can quantify bus performance
#Stop sequence is the given number of stops a bus has made in its line. It is not a bus stop identifier, but rather a count of stops made in a line
C:\Users\Milk\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py:3072: DtypeWarning: Columns (7) have mixed types.Specify dtype option on import or set low_memory=False. interactivity=interactivity, compiler=compiler, result=result)
| ACTL_ARR_DT | ACTL_ARR_TM | ACTL_DEP_DT | ACTL_DEP_TM | SCHED_DEP_DT | SCHED_DEP_TM | STOP_SEQUENCE | ROUTE_SHORT_NAME | STOP_NAME | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 10/8/2019 | 9:12:47 | 10/8/2019 | 9:12:47 | 10/8/2019 | 9:13:32 | 49 | 225D | Baseline Rd & Meadowbrook Dr |
| 1 | 10/8/2019 | 13:01:10 | 10/8/2019 | 13:01:10 | 10/8/2019 | 12:59:08 | 49 | 225D | Baseline Rd & Meadowbrook Dr |
| 2 | 10/8/2019 | 15:02:45 | 10/8/2019 | 15:02:45 | 10/8/2019 | 15:01:08 | 49 | 225D | Baseline Rd & Meadowbrook Dr |
| 3 | 10/8/2019 | 14:01:55 | 10/8/2019 | 14:01:55 | 10/8/2019 | 13:59:08 | 49 | 225D | Baseline Rd & Meadowbrook Dr |
| 4 | 10/8/2019 | 12:00:59 | 10/8/2019 | 12:00:59 | 10/8/2019 | 11:59:32 | 49 | 225D | Baseline Rd & Meadowbrook Dr |
weather_df = pd.read_csv('NOAA Weather Data.csv')
weather_df['DATE'] = pd.to_datetime(weather_df['DATE'])
weather_df.tail()
#This data contains detailed info on weather for different areas in northwest Colorado
| STATION | NAME | LATITUDE | LONGITUDE | ELEVATION | DATE | DAPR | MDPR | PRCP | SNOW | SNWD | TAVG | TMAX | TMIN | TOBS | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4096 | US1COBO0497 | BOULDER 3.2 S, CO US | 39.9807 | -105.2418 | 1647.1 | 2019-11-05 | NaN | NaN | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN |
| 4097 | US1COBO0497 | BOULDER 3.2 S, CO US | 39.9807 | -105.2418 | 1647.1 | 2019-11-06 | NaN | NaN | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN |
| 4098 | US1COBO0497 | BOULDER 3.2 S, CO US | 39.9807 | -105.2418 | 1647.1 | 2019-11-07 | NaN | NaN | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN |
| 4099 | US1COBO0497 | BOULDER 3.2 S, CO US | 39.9807 | -105.2418 | 1647.1 | 2019-11-08 | NaN | NaN | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN |
| 4100 | US1COBO0497 | BOULDER 3.2 S, CO US | 39.9807 | -105.2418 | 1647.1 | 2019-11-09 | NaN | NaN | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN |
stoptimes_df = pd.read_csv('stop_times.csv')
stoptimes_df.head()
#This data lists specific bus trips from stop to stop along a given bus line
#One bus moving from stop to stop has the same trip_ID
#stop_id identifies the specific bus stop that the bus stopped at
#stop sequence is the given number of stops a bus has made in its line so far
| trip_id | arrival_time | departure_time | stop_id | stop_sequence | stop_headsign | pickup_type | drop_off_type | shape_dist_traveled | timepoint | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 113469607 | 19:19:00 | 19:19:00 | 26175 | 1 | NaN | 0 | 1 | NaN | 1 |
| 1 | 113469607 | 19:20:45 | 19:20:45 | 20171 | 2 | NaN | 0 | 0 | NaN | 0 |
| 2 | 113469607 | 19:21:59 | 19:21:59 | 20094 | 3 | NaN | 0 | 0 | NaN | 0 |
| 3 | 113469607 | 19:22:41 | 19:22:41 | 33371 | 4 | NaN | 0 | 0 | NaN | 0 |
| 4 | 113469607 | 19:23:08 | 19:23:08 | 12522 | 5 | NaN | 0 | 0 | NaN | 0 |
trips_df = pd.read_csv('trips.csv')
trips_df.head()
#This data connects trip_IDs to specific routes
| route_id | service_id | trip_id | trip_headsign | direction_id | block_id | shape_id | |
|---|---|---|---|---|---|---|---|
| 0 | 0 | SA | 113469607 | Union Station | 0 | 0 11 | 1159956 |
| 1 | 0 | SA | 113469608 | Union Station | 0 | 0 4 | 1159954 |
| 2 | 0 | SA | 113469609 | Union Station | 0 | 0 1 | 1159975 |
| 3 | 0 | SA | 113469610 | Union Station | 0 | 0 8 | 1159965 |
| 4 | 0 | SA | 113469611 | Union Station | 0 | 0 11 | 1159965 |
stops_df = pd.read_csv('Stops.csv')
stops_df=stops_df.rename(columns = {'stop_name':'STOP_NAME'})
stops_df.head()
#This data contains detailed info regarding the locations of specific bus stops.
#Contains lat/long info which can be used for spatial analysis
| stop_id | stop_code | STOP_NAME | stop_desc | stop_lat | stop_lon | zone_id | stop_url | location_type | parent_station | stop_timezone | wheelchair_boarding | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 26157 | 26157 | 40th Ave & Airport Blvd-Gateway Park Stn Gate D | Vehicles Travelling West | 39.770084 | -104.786985 | 26157 | NaN | 0 | 33740.0 | NaN | 1 |
| 1 | 33820 | 33820 | Longmont | Vehicles Travelling North | 40.149505 | -105.103068 | NaN | NaN | 1 | NaN | NaN | 0 |
| 2 | 33835 | 33835 | Paradise Hills | Vehicles Travelling North | 39.704446 | -105.250268 | NaN | NaN | 1 | NaN | NaN | 0 |
| 3 | 24890 | 24890 | Englewood Station | Vehicles Travelling North | 39.655904 | -104.999736 | B | NaN | 0 | 33712.0 | NaN | 1 |
| 4 | 34118 | 34118 | 20th St / Welton Station | Vehicles Travelling North | 39.748163 | -104.986671 | NaN | NaN | 1 | NaN | NaN | 0 |
First, I need to prepare my primary bus performance dataframe. To quantify bus performance, I am going to convert arrival and departure values to datetime values, and calculate the ifference beetween scheduled and actual departure times. This will leave me with a total amount of minutes the bus was early/late from the stop.
#clean dataset for random, unwanted values
RTD_performance_df = RTD_performance_df[RTD_performance_df.ACTL_DEP_DT != '1/1/1902']
#create new coulumn combining actual departure date and time
RTD_performance_df['Actual Depart'] = pd.to_datetime(RTD_performance_df['ACTL_DEP_DT'] + ' ' + RTD_performance_df['ACTL_DEP_TM'])
#create new coulumn combining scheduled departure date and time
RTD_performance_df['Scheduled Depart'] = pd.to_datetime(RTD_performance_df['SCHED_DEP_DT'] + ' ' + RTD_performance_df['SCHED_DEP_TM'])
# subtract scheduled depart from actual depart
RTD_performance_df['Performance'] = RTD_performance_df['Actual Depart'] - RTD_performance_df['Scheduled Depart']
#convert performance to minutes
RTD_performance_df['Performance'] = RTD_performance_df['Performance'].dt.total_seconds() / 60
#clean up df
RTD_performance_df['DATE'] = RTD_performance_df['ACTL_ARR_DT']
del RTD_performance_df['ACTL_ARR_DT']
del RTD_performance_df['ACTL_ARR_TM']
del RTD_performance_df['ACTL_DEP_DT']
del RTD_performance_df['ACTL_DEP_TM']
del RTD_performance_df['SCHED_DEP_TM']
del RTD_performance_df['SCHED_DEP_DT']
RTD_performance_df['DATE'] = pd.to_datetime(RTD_performance_df['DATE'])
RTD_performance_df.head()
#the performance column shows positive values if the bus was late, and negative values if the bus was early
| STOP_SEQUENCE | ROUTE_SHORT_NAME | STOP_NAME | Actual Depart | Scheduled Depart | Performance | DATE | |
|---|---|---|---|---|---|---|---|
| 0 | 49 | 225D | Baseline Rd & Meadowbrook Dr | 2019-10-08 09:12:47 | 2019-10-08 09:13:32 | -0.750000 | 2019-10-08 |
| 1 | 49 | 225D | Baseline Rd & Meadowbrook Dr | 2019-10-08 13:01:10 | 2019-10-08 12:59:08 | 2.033333 | 2019-10-08 |
| 2 | 49 | 225D | Baseline Rd & Meadowbrook Dr | 2019-10-08 15:02:45 | 2019-10-08 15:01:08 | 1.616667 | 2019-10-08 |
| 3 | 49 | 225D | Baseline Rd & Meadowbrook Dr | 2019-10-08 14:01:55 | 2019-10-08 13:59:08 | 2.783333 | 2019-10-08 |
| 4 | 49 | 225D | Baseline Rd & Meadowbrook Dr | 2019-10-08 12:00:59 | 2019-10-08 11:59:32 | 1.450000 | 2019-10-08 |
I also need to find the average amounts of inclimate weather that occurred over the time period. I am going to combine the precipitation and snow columns and create a new column with an average inclimate weather amount per day.
grouped_weather = weather_df.groupby('DATE').mean()
grouped_weather['incl weather'] = grouped_weather['PRCP'] + grouped_weather['SNOW']
grouped_weather.head()
| LATITUDE | LONGITUDE | ELEVATION | DAPR | MDPR | PRCP | SNOW | SNWD | TAVG | TMAX | TMIN | TOBS | incl weather | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| DATE | |||||||||||||
| 2019-10-01 | 40.009590 | -105.222388 | 1789.863793 | 7.5 | 0.297500 | 0.040000 | 0.0 | 0.0 | 41.5 | 62.333333 | 42.000000 | 43.800000 | 0.040000 |
| 2019-10-02 | 40.008274 | -105.218973 | 1782.809836 | 2.0 | 0.183333 | 0.194786 | 0.0 | 0.0 | 41.5 | 52.142857 | 38.571429 | 41.166667 | 0.194786 |
| 2019-10-03 | 40.009597 | -105.225312 | 1801.927027 | NaN | NaN | 0.011273 | 0.0 | 0.0 | 44.0 | 59.125000 | 37.125000 | 43.666667 | 0.011273 |
| 2019-10-04 | 40.004000 | -105.226609 | 1823.035000 | 2.0 | 0.000000 | 0.000306 | 0.0 | 0.0 | 51.5 | 64.250000 | 35.750000 | 41.833333 | 0.000306 |
| 2019-10-05 | 40.005541 | -105.220527 | 1809.416327 | NaN | NaN | 0.000206 | 0.0 | 0.0 | 44.5 | 63.250000 | 36.250000 | 47.333333 | 0.000206 |
In order to get a broad overview of how the individual bus lines performed over the time period, I am going to perform a groupby on "STOP_NAME" and "DATE" for each bus line.
skip_routes = RTD_performance_df[RTD_performance_df['ROUTE_SHORT_NAME'] == 'SKIP']
skip_grouped = skip_routes.groupby(["STOP_NAME", "DATE"]).sum()
del skip_grouped['STOP_SEQUENCE']
_225_routes = RTD_performance_df[RTD_performance_df['ROUTE_SHORT_NAME'] == '225']
_225_grouped = _225_routes.groupby(["STOP_NAME", "DATE"]).sum()
del _225_grouped['STOP_SEQUENCE']
_225D_routes = RTD_performance_df[RTD_performance_df['ROUTE_SHORT_NAME'] == '225D']
_225D_grouped = _225D_routes.groupby(["STOP_NAME", "DATE"]).sum()
del _225D_grouped['STOP_SEQUENCE']
_225T_routes = RTD_performance_df[RTD_performance_df['ROUTE_SHORT_NAME'] == '225T']
_225T_grouped = _225T_routes.groupby(["STOP_NAME", "DATE"]).sum()
del _225T_grouped['STOP_SEQUENCE']
dash_routes = RTD_performance_df[RTD_performance_df['ROUTE_SHORT_NAME'] == 'DASH']
dash_grouped = dash_routes.groupby(["STOP_NAME", "DATE"]).sum()
del dash_grouped['STOP_SEQUENCE']
FF1_routes = RTD_performance_df[RTD_performance_df['ROUTE_SHORT_NAME'] == 'FF1']
FF1_grouped = FF1_routes.groupby(["STOP_NAME", "DATE"]).sum()
del FF1_grouped['STOP_SEQUENCE']
#example of grouped bus line with summed performance
skip_grouped.head()
| Performance | ||
|---|---|---|
| STOP_NAME | DATE | |
| 27th Way & Broadway PnR | 2019-10-01 | 152.483333 |
| 2019-10-02 | 183.566667 | |
| 2019-10-03 | 122.466667 | |
| 2019-10-04 | 219.550000 | |
| 2019-10-05 | 298.133333 |
#The following plots will show bus performance over time for each specific bus line
skip_grouped.reset_index().pivot('DATE','STOP_NAME','Performance').plot(title='SKIP'
,figsize = (15,15)).legend(loc='center left',bbox_to_anchor=(1.0, 0.5))
FF1_grouped.reset_index().pivot('DATE','STOP_NAME','Performance').plot(title='FF1'
,figsize = (15,15)).legend(loc='center left',bbox_to_anchor=(1.0, 0.5))
dash_grouped.reset_index().pivot('DATE','STOP_NAME','Performance').plot(title='DASH'
,figsize = (15,15)).legend(loc='center left',bbox_to_anchor=(1.0, 0.5))
_225_grouped.reset_index().pivot('DATE','STOP_NAME','Performance').plot(title='225'
,figsize = (15,15)).legend(loc='center left',bbox_to_anchor=(1.0, 0.5))
_225D_grouped.reset_index().pivot('DATE','STOP_NAME','Performance').plot(title='225D'
,figsize = (15,15)).legend(loc='center left',bbox_to_anchor=(1.0, 0.5))
_225T_grouped.reset_index().pivot('DATE','STOP_NAME','Performance').plot(title='225T'
,figsize = (15,15)).legend(loc='center left',bbox_to_anchor=(1.0, 0.5))
<matplotlib.legend.Legend at 0x12ac6313160>
#All bus line performance over time
#removed legend because it is too long and ugly
grouped_perf_df = RTD_performance_df.groupby(["STOP_NAME", "DATE"]).sum()
del grouped_perf_df['STOP_SEQUENCE']
grouped_perf_df.reset_index().pivot('DATE','STOP_NAME','Performance').plot(title='All Lines'
,figsize = (15,15), legend = False)
<AxesSubplot:title={'center':'All Lines'}, xlabel='DATE'>
#a simpler plot
grouped_perform = RTD_performance_df.groupby('DATE').sum()
grouped_perform['Performance'].plot(figsize = (7,7))
<AxesSubplot:xlabel='DATE'>
Judging by the individual bus line performance plots, as well as the combined bus performance plot, it seems that there are specific days that bus performance was undoubtedly impacted. Also, keep in mind, the higher the spike, the more late the bus was when arriving at a specified stop. October 29th in particular saw severe delays across nearly all bus lines.
Let's get an idea of what the weather was like during the same time period.
grouped_weather['incl weather'].plot(figsize = (7,7))
<AxesSubplot:xlabel='DATE'>
The inclimate weather plot seems to have very similar spikes to the bus performance plots. Let's put the two on the same plot and do some further investigation
#create copy of df
perf_weather = grouped_perform.copy()
#make new column for weather
perf_weather['Weather'] = grouped_weather['incl weather']
#delete uneeded data from new df
del perf_weather['STOP_SEQUENCE']
#new column for the date
perf_weather['Date'] = perf_weather.index
perf_weather.head()
| Performance | Weather | Date | |
|---|---|---|---|
| DATE | |||
| 2019-09-30 | -8.250000 | NaN | 2019-09-30 |
| 2019-10-01 | 43516.016667 | 0.040000 | 2019-10-01 |
| 2019-10-02 | 42369.533333 | 0.194786 | 2019-10-02 |
| 2019-10-03 | 47113.683333 | 0.011273 | 2019-10-03 |
| 2019-10-04 | 60475.566667 | 0.000306 | 2019-10-04 |
#plot the bus performance and weather data together with different y axes
fig, ax = plt.subplots()
perf_weather.plot(x = 'Date', y = 'Performance', ax = ax)
perf_weather.plot(x = 'Date', y = 'Weather', ax = ax, secondary_y = True)
<AxesSubplot:label='bc1e1534-57ab-4926-9afd-dd144acdaff7'>
There looks to be some similar spikes between worse bus performance and inclimate weather, especially on the days of October 11th, and October 29th. Although to be fair, October 24th saw pretty bad weather, and it appears buses were not severely impacted.
However, showing that bus performance can be impacted by bad weather isn't a particularly revolutionary idea. I think it would be more interesting to highlight a specific day where bus performance and weather was bad, and look at which specific bus lines and bus stops had the best/worst performance.
#create new df only showing data for Oct 29th, 2019
single_day_perf = RTD_performance_df[RTD_performance_df['DATE'] == '2019-10-29']
del single_day_perf['STOP_SEQUENCE']
single_day_perf.head()
| ROUTE_SHORT_NAME | STOP_NAME | Actual Depart | Scheduled Depart | Performance | DATE | |
|---|---|---|---|---|---|---|
| 10534 | 225D | Public Rd & South Boulder Rd | 2019-10-29 14:14:43 | 2019-10-29 14:09:55 | 4.800000 | 2019-10-29 |
| 10535 | 225D | Public Rd & South Boulder Rd | 2019-10-29 12:23:00 | 2019-10-29 12:09:55 | 13.083333 | 2019-10-29 |
| 10536 | 225D | Public Rd & South Boulder Rd | 2019-10-29 13:21:57 | 2019-10-29 13:09:55 | 12.033333 | 2019-10-29 |
| 10537 | 225D | Public Rd & South Boulder Rd | 2019-10-29 11:12:27 | 2019-10-29 11:09:55 | 2.533333 | 2019-10-29 |
| 10538 | 225D | Public Rd & South Boulder Rd | 2019-10-29 09:36:43 | 2019-10-29 09:07:55 | 28.800000 | 2019-10-29 |
#plot each bus line's total performance on Oct 29th.
plot = single_day_perf.groupby("ROUTE_SHORT_NAME").sum()
sb.catplot(y="Performance", x="ROUTE_SHORT_NAME",data=plot.reset_index(),kind="bar",height=5)
<seaborn.axisgrid.FacetGrid at 0x12ac4e960f0>
It appears from this bar plot that Skip and Dash were the worst performing bus lines on October 29th, while the 225 performed the best.
Finding the best and worst performing stops in each bus line
skip_routes = single_day_perf[single_day_perf['ROUTE_SHORT_NAME'] == 'SKIP']
skip_grouped = skip_routes.groupby("STOP_NAME").sum()
skip_grouped.sort_values('Performance').tail()
| Performance | |
|---|---|
| STOP_NAME | |
| Broadway & Euclid Ave | 881.050000 |
| Broadway & Canyon Blvd | 885.766667 |
| Broadway & Iris Ave | 886.350000 |
| Broadway & Alpine Ave | 894.716667 |
| Broadway & Walnut St | 906.600000 |
ff1_routes = single_day_perf[single_day_perf['ROUTE_SHORT_NAME'] == 'FF1']
ff1_grouped = ff1_routes.groupby("STOP_NAME").sum()
ff1_grouped.sort_values('Performance').tail()
| Performance | |
|---|---|
| STOP_NAME | |
| Broadway & Regent Dr | 1145.066667 |
| Downtown Boulder Station (Ar) | 1302.650000 |
| Broadway & Euclid Ave | 1537.350000 |
| S Broadway & Dartmouth Ave | 1547.183333 |
| Broadway & Baseline Rd | 1674.966667 |
dash_routes = single_day_perf[single_day_perf['ROUTE_SHORT_NAME'] == 'DASH']
dash_grouped = dash_routes.groupby("STOP_NAME").sum()
dash_grouped.sort_values('Performance').tail()
| Performance | |
|---|---|
| STOP_NAME | |
| Broadway & University Ave | 724.883333 |
| Broadway & Baseline Rd | 727.316667 |
| Table Mesa Dr & Tantra Dr | 739.483333 |
| Broadway & Euclid Ave | 743.816667 |
| Broadway & Canyon Blvd | 753.116667 |
_225_routes = single_day_perf[single_day_perf['ROUTE_SHORT_NAME'] == '225T']
_225_grouped = _225_routes.groupby("STOP_NAME").sum()
_225_grouped.sort_values('Performance').tail()
| Performance | |
|---|---|
| STOP_NAME | |
| Broadway & University Ave | 60.733333 |
| Broadway & Regent Dr | 61.966667 |
| Broadway & Euclid Ave | 62.983333 |
| Broadway & Canyon Blvd | 65.266667 |
| Downtown Boulder Station Gate E | 66.633333 |
| Line | Bus Stops |
|---|---|
FF1 |
Union Station Gate B17, Park Ave West & Wewatta St, Union Station Gate B6 |
Dash |
E Spaulding St & Dove Cove , E Spaulding St & S Public Rd , W Baseline Rd & N Cornelius St |
Skip |
Downtown Boulder Station Gate H, Broadway & 20th St, Broadway & Hawthorne Ave |
225D |
Diamond Cir & Black Diamond Dr, Aspen Ridge Dr & Diamond Cir, S Public Rd & Laser St |
225T |
Broadway & 20th St, Broadway & College Ave, Tenino Ave & Oneida St |
| Line | Bus Stops |
|---|---|
FF1 |
Broadway & Baseline Rd, S Broadway & Dartmouth Ave , Broadway & Euclid Ave |
Dash |
Broadway & Canyon Blvd , Broadway & Euclid Ave ,Table Mesa Dr & Tantra Dr |
Skip |
Broadway & Walnut St, Broadway & Alpine Ave, Broadway & Iris Ave |
225D |
W Baseline Rd & N Cornelius St, Lafayette PnR Gate B, US 287 & Dillon Rd |
225T |
Downtown Boulder Station Gate E, Broadway & Canyon Blvd , Broadway & Euclid Ave |
Yes. If I were to take the data literally, DASH and SKIP perform the worst in inclimate weather. However, the analysis of individual bus stop performance for each bus line shows that the stops within the FF1 line had much worse performance on October 29th. DASH and SKIP seeming to perform worse could be because DASH and SKIP have more overall stops.
Somewhat. Two storms hit Colorado from October 1st to November 11th. During one of the storms which hit October 28th and 29th, overall bus performance was severely delayed. A similar, but not as dramatic performance hit was observed during another storm on October 11th. However, October 24th also saw inclimate weather, but the data shows a neglibible hit to bus performance.
Yes. The Broadway & Baseline Rd stop alone in the FF1 Bus line saw a total delay of 1674 minutes on October 29th. Interestingly enough, the least delayed FF1 stops on October 29th were all in / closer to downtown Denver, and the most delayed stops were all in Boulder, close to CU Boulder. The same bus stop mentioned above, Broadway & Baseline Rd, saw a 727 minute total delay on October 29th, but within the DASH bus line instead of FF1. Other bus lines saw drastically different variations across the bus line during October 29th, a period of bad inclimate weather.
Something that surprised me during the inspection of the data was the sheer amount of stops that just one bus line makes in a single day. For example, SKIP alone has over 105 northbound and 105 southbound trips per day, and there are 49 stops in that bus line. When I was calculating overall performance delays on days with inclimate weather, I was seeing what seemed like unfathomable delays (6000+ hours). But when divided over 10,000 stops just for one line, the delays seemed so much more minor.
There were a few things I wish I could have accomplished during this project that I was unable to. If I were to continue working on this project I would do the following:
Get additional data sources for factors that could also be affecting bus performance. Things like traffic data and data on things like local sporting events causing high usage would only help explain variance in bus performance
Find a way to "fairly" compare bus lines that have different amounts of stops. If a bus line is running late due to bad weather and that bus line has more stops, it will always look to be performing worse than other lines. I would like to find a way to more fairly compare bus lines like FF1 and DASH.
Use spatial data to create maps/choropleth which could possibly give better insight into the data.